JOURNAL OF GEOPHYSICAL RESEARCH, VOL. xx, NO. xx, PAGES 1?? , FEBRUARY 2002 


Kalman Filter Chemical Data Assimilation: 
A Case Study in January 1992 

D. J. Lary 

Data Assimilation Office, NASA Goddard Space Flight Centre, Greenbelt, MD 


B. Khattatov 

NCAR t Boulder, CO 


H. Mussa 

Department of Chemistry, University of Cambridge, England 


Abstract. This paper describes a Kalman filter chemical data assimilation system and 
its use for analysing a vertical atmospheric profile during January 1992. The vertical pro- 
file was at an equivalent PV latitude (<M of 55°S and consisted of 21 potential temper- 
ature (0) levels spaced equally iu log(fl) between 400 K and 2000 K. This equivalent lat- 
itude was chosen as it was well observed during January 1992 by instruments on board 
the Upper Atmosphere Research Satellite (UARS). 


1. Introduction 

Measurements of atmospheric constituents made over 
the last decade or more have cost many millions of dol- 
lars/pounds to make. The intelligent use of this data on 
a wide variety of species is a non-trivial task as the ob- 
servations are not co-located in time or space. Satellites 
make measurements of atmospheric constituents by a range 
of methods, and at a range of times and locations. The mea- 
surements are not made on a regular spatial grid or at the 
same times of day. Since the analysis of satellite measure- 
ment is so complex, the measurements have not been used 
to their full potential. 

In comparison to the analysis of meteorological variables, 
chemical trace species has received little attention. Current 
methods tend to be either simple comparisons of observa- 
tions with a model (which are not necessarily constrained to 
be directly comparable) and/or treat species independently, 
ignoring the complex balances which exist between species. 
Moreover, the large diurnal variations in the concentrations 
of many species are either accounted for in very simple ways, 
or avoided by analysing concentrations at fixed local time. 
This is a great shame as the shape of a species diurnal cy- 
cle, and the relative partitioning between species, contains 
a lot of valuable information that is completely wasted if we 
do not use a technique such that can exploit this informa- 
tion. Naturally such information can only be exploited if 
it includes a theoretical understanding of the chemical sys- 
tem. Data assimilation is a valuable assistant in making 
better use of observations of atmospheric chemistry. This 
paper describes a Kalman filter for chemical data assim- 
ilation with observation quality control and analyses skill 
assessment cast in flow-tracking coordinates. 
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2. Flow- Tracking Coordinate System 

We want to look at the detailed interactions between 
chemical species and exploit the propagation of information 
between chemical species by using a Kalman filter which 
calculates the time evolution of the full co-variance ma- 
trix. This is expensive and so as a first step we will take 
a lagrangian approach. To give us global analyses we then 
use a two-dimensional array of independent time evolving 
chemical box models (described in section 3). This two- 
dimensional array is arranged in an equivalent-PV latitude 
theta flow tracking co-ordinate system [Lary et a/., 1995a]. 
This approximation is certainly valid for our analysis inter- 
val of one day, and often for up to ten days. It is a way 
of largely separating the effects of chemistry and dynamics. 
Because a major component of the variability of trace gases 
is due to the atmospheric motions we use a co-ordinate sys- 
tem to perform our data assimilation that ‘moves with the 
large scale flow pattern. 

In addition, the Kalman filter chemical data assimilation 
is computationally expensive, one diurnal cycle for one verti- 
cal profile taking 35 minutes of computer time on a 1.7 GHz 
intel pentium IV computer (this includes the first guess, as- 
similation, and analyses run). So it is useful to have a global 
2D assimilation by using an equivalent PV latitude (<£ e ), 
potential temperature (0) co-ordinate system. Our grid has 
21 potential temperature levels spaced equally in log{i9) be- 
tween 400 K and 2000 K, and 32 equivalent PV latitudes 
spaced evenly between -90° and 90°. Here we consider in 
detail just one of these 32 profiles, the one at 55 S as it was 
well observed during January 1992 by instruments aboard 
the Upper Atmosphere Research Satellite (UARS). 

3. Chemical Scheme 

We use the extensively validated AutoChem model de- 
scribed [Fisher and Lary y 1995; Lary et at 1995b, Lary : 
1996]. The model is explicit and uses an adaptive-timestep, 
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error monitoring time integration scheme for stiff systems 
of equations [Stoer and Bulirsch , 1980; Press et a/., 1992]. 
AutoChem was the first model to ever have the facility to 
perform 4D variational data assimilation (4 D-Var) [Fisher 
and Lary , 1995] and now also includes a Kalman filter [Khat- 
tatov et ai , 1999]. AutoChem uses kinetic data largely based 
on [DeMore et ai } 1997] and [Atkinson et ai , 1997]. 

Our usual chemical system contains a total of 60 species. 
55 species are time integrated, namely: O^D), 0( 3 P), 

0 3 , N, NO, N0 2j N0 3 , N 2 0 5 , HONO, HN0 3 , H0 2 N0 2j 
CN, NCO, HCN, Cl, Cl 2 , CIO, ClOO, OCIO, C1 2 0 2 , 
C1N0 2 , CIONO, C10N0 2 , HC1, HOC1, CH 3 OCl, Br, Br 2 , 
BrO, BrON0 2 , BrONO, HBr, HOBr, BrCl, H 2 , H, OH, 
ho 2 , h 2 o 2 , CH 3i ch 3 o, CH 3 0 2i CH 3 OH, CH 3 OOH, 
CH 3 0N0 2 , CH 3 0 2 N0 2 , HCO, HCHO, CH 4 , CH 3 Br, 
CF 2 C1 2 , CO, N 2 0, C0 2 , H 2 0. The remaining 5 species 
are not integrated and not in photochemical equilibrium, 
namely: 0 2 , N 2 , HCl(s), H 2 0 (s}» HN0 3 (S)* The model con- 
tains a total of 420 reactions, 278 bimolecular reactions, 32 
trimolecular reactions, 60 photolysis reactions, 4 cosmic ray 
processes, 46 heterogeneous reactions. 

3.1. Radiative Transfer Calculations 

A key part of the chemical model is the calculation of pho- 
tolysis rates. In this study photolysis rates are calculated 
using full spherical geometry and multiple scattering [Lary 
and Pyle , 1991a, b; Meier et ai , 1982; Nicolet et ai , 1982] 
with a treatment of spherical geometry [ Anderson , 1983]. 
The photolysis rate used for each time step is obtained by 
ten point Gaussian-Legendre integration [Press et a/., 1992]. 
These calculations are updated on every assimilation itera- 
tion to ensure that the improved ozone profile at a given 
equivalent latitude is used to calculate the photolysis rates. 

4. Quality Control 

Observation quality control is a central part of chemical 
data assimilation. Our system transforms the observations 
into a flow tracking coordinate system. We then use many 
observations to produce a single pseudo observation profile. 
We then deal with a complete pseudo observation profile at 
a time. 

No observation is used unless the ratio of the observa- 
tional error, < 7 , to the observed concentration, Xi which we 
will call the quality ratio, Q Tl does not exceed a certain 
specified threshold (1). 


dealing with the same location. In addition, it allows an 
improved signal to noise as many observations are used in 
forming just one psuedo observation. 

We have chosen to deal with a profile of pseudo observa- 
tions at a time as the two quality control criteria mentioned 
above can lead to gaps in our vertical profile. These are easy 
to fill in by eye, but for an algorithm to deal with the data 
voids we need to consider an entire profile at a time in our 
flow tracking co-ordinate space. In a full 3D assimilation the 
3D co-variance matrix would be performing this task. How- 
ever, in this study we are using a full Kalman filter with a 
detailed chemistry. To make this computationally achiev- 
able we use multiple 0D box models which are stacked into 
a series of profiles. Which, as mentioned earlier, then gives 
us a 2D global assimilation with 21 potential temperature 
levels spaced equally in log(0) between 400 K and 2000 K, 
and 32 equivalent PV latitudes spaced evenly between -90° 
and 90°. 

The key point about our generation of pseudo observa- 
tions is that we deal with the ratio of the observed con- 
centration, xo , to the analysis concentration, \a, which we 
will call the observed concentration ratio, R, not with the 
observed concentrations directly (2). 

R = — (2) 

XA 

Where the analysis concentration is interpolated to the loca- 
tion of each observation in turn. We then look at the many 
observation points that fall between the bottom of a given 
grid box and the top of the current grid box as a distribution 
of observed concentration ratios. We then take the observed 
concentration ratio for that grid box as the median observed 
concentration ratio of this distribution of observed concen- 
tration ratios. The median is used as it is not affected by 
any large outliers in the distribution of observed concentra- 
tion ratios. If there are more than a threshold number of 
observations, N, (usually 10) then we just take the median 
of the N most accurate observations. The pseudo observa- 
tion for a grid box, ls then simply the product of 

the median observed concentration ratio, Radian, and the 
current analysis concentration, \A (3). 



Xpseudo — PmedianXA 


(3) 


The value of this threshold is specified for each observed 
species separately based on the characteristics of the instru- 
ment involved. This criteria has proved to be important in 
removing rouge observations and is recommended to others 
engaged in chemical data assimilation. In addition there is 
option that an observation is only used if with its associated 
uncertainty it overlaps the current analysis concentration 
and its associated uncertainty (this criteria is not always 
applied as it can lead to an incestuous relationship between 
the observations chosen and the analyses). We usually have 
two iterations of the Kalman filter, the first using all obser- 
vations that passed the quality ratio test, and the second 
where the analyses state is also used to perform quality con- 
trol. 

4.1. Generation of the Pseudo Observations 

Generating a pseudo observation is necessary for the as- 
similation as both the observation and analyses need to be 


If we have any gaps in the vertical pseudo observation profile 
we simply linearly interpolate the median observed concen- 
tration ratio from the available points above and below the 
gap. Since this ratio is generally quite close to one the in- 
terpolation is rather good, and better than performing an 
interpolation in concentration units. The concentration can 
change by more than an order of magnitude over the pro- 
file and contain strong gradients. In contrast R is generally 
close to one and does not contain strong gradients. 

4.2. Observation Uncertainties 

The uncertainty of the pseudo observation has two com- 
ponents. First, the observational uncertainty which is taken 
to be the median observed concentration uncertainty of the 
distribution of observed concentration uncertainties. Sec- 
ond, the representativeness which is taken to be the average 
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deviation of the distribution of observed concentrations for 
the grid box. 

N 

cr rcp = ADev(x 1 • • • XN ) = jj ^2 l Xj “ *1 

The average deviation, or mean absolute deviation, is a ro- 
bust estimator of the width of the distribution [Press et of, 
1992]. 

It has been found desirable to include a moving aver- 
age smoothing which involves the current grid box, and the 
boxes above and below. 

The usefulness of generating pseudo observations in 
this way is particularly noticeable for those observational 
datasets that have gaps and those that are rather noisy. 


We now assume that that the probability density functions 
associated with x and y can be satisfactory approximated 
by Gaussian functions: 

. / (x-xt) T C- l (x-xt)\ 

PDF( y) - exp ( J (H) 

where x t is the true value of x and C is the corresponding 
error covariance matrix. Its diagonal elements are the un- 
certainties (standard deviations) of x, and the off-diagonal 
elements represent correlation between uncertainties of dif- 
ferent elements of vector x. The covariance matrix C is 
defined as 

C = ((x - x t )(x - x t ) T ) (12) 


4.3. Checking the Errors 

It often hard to know if the observation and apriori errors 
have been correctly specified. In this particular study the 
best characterised error is the representativeness error. The 
model error growth is taken to be 5% per time step. The 
observation errors are taken directly from the values speci- 
fied with the observed concentrations by the retrieval teams. 
However, these observational uncertainties could be in error 
as was found by Menard et al. [2000]; Menard and Chang 
[2000]. A useful check is 

< (O- F) 2 > « o\+o) (5) 


5. Kalman Filter 

The chemical Kalman filter [Khattatov et af, 1999] al- 
lows one to optimally combine model simulations and mea- 
surements taking into account their respective uncertainties. 
Consider a model of a physical system represented by opera- 
tor (generally nonlinear) M, and let vector x with dimension 
N* be a set of input parameters for the model. These in- 
put parameters are used to predict the state of the system, 
vector y with dimension N y : 

y = Al (x) (0) 

Assume that vector x represents the state of a time- 
dependent numerical photochemical model, i.e., concentra- 
tions of modeled species at model grid points in the atmo- 
sphere. In the case of a box model that includes N species, 
the dimension of vector x would be N. We will now limit the 
discussion to the case when M is used to predict the state 
of the system at some future time from past state estimates. 
Formally, in this case 

x = x t , y = xt+At ( 7 ) 

and xt+At = Ad(t,xt) (®) 

Let vector y 0 contain observations of the state. Usually, the 
dimension of y 0 is less than N y , the dimension of the model 
space, since not all model species are usually observed. The 
connection between y 0 and y can be established through the 
so-called observational operator H: 

yo = -H(x) (9) 

Combining the above two equations, we get 


where angle brackets represent averaging over all available 
realizations of x. 

For most practical applications we need to introduce the 
linear approximation. In the linear approximation we as- 
sume that for small perturbations of the parameter vector 
Ax the following is approximately true: 

At (x + Ax) = M(x) 4- £Ax (13) 


Formally, L is a derivative of Ad with respect to x: 



(14) 


For small variations of x one can show that the evolution of 
error covariance matrix Ct is given by: 


Ct+z, = CCtC T + Q (15) 

Matrix Q is the error covariance matrix introduced to take 
into account uncertainties of the model calculations. The 
Kalman filter equations are 


x t i- At = At(t.Xt) 

C (+ A( = CC t C T + Q 


Xt = Xt + C t -H T {nCtU T + O) '(yo-Hx.) (16) 

C t =C t +C t H T ('HC l n T + 0)~ l HC t (17) 

At the end of each analysis period the model value (x ( ) and 
the corresponding observation (y„) are ‘mixed’ (see (16)) 
with weights inversely proportional to their respective errors 
to produce the analysis, xt. Then the model is integrated 
forward in time starting from the obtained analysis. Once 
an observation has been incorporated in the model, the anal- 
ysis error covariance should be updated to reflect this (see 
(17)). In the absence of observations, the model state is 
updated using (8), while evolution of the error covariance is 
obtained from the linearized model equations as in (15). 

If no observations are available, then 


x t = Xt 
Ct = C t 


(18) 

(19) 


y« = U{M(x)) 


( 10 ) 
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6. Skill Scores 

Once the data assimilation analyses has been performed 
we need to quantify how good the analyses is. This is done 
by generating a wide range of statistics. These statistics 
compare the observations used in making the analyses with 
the analyses itself. These statistics are presented in a web 
site automatically created by our software. 

The diagnostics/statistics are as follows: 

1. Observation Increment The difference between the 
first guess and the observations, also known as observed- 
minus-background differences or the innovation vector 
[Daly, 1991]. This is probably the best measure of forecast 
skill. 

2. Analysis Increment The difference between the first 
guess and the final analyses, also known as analysis-minus- 
background differences or the correction vector [Daly, 1991]. 
This is a good measure of model bias. 

3. Cost (jargon for accumulated difference between anal- 
yses and observations), both globally and for each single 
location in the analyses. 

4. Scatter plots of observations against analyses, these 
can visually highlight any biases present. 

5. Normal Probability Plots of (Observation - 
Analysis) Values are useful graphs for assessing whether 
data comes from a normal distribution. 

6. Quantile-Quantile Plots of observations against 
analyses. A quantile- quantile plot is useful for determin- 
ing whether two samples come from the same distribu- 
tion (whether normally distributed or not). The quantile- 
quantile plot has three graphical elements. The pluses are 
the quantiles of each sample. The number of pluses is the 
number of data values in the smaller sample. The solid 
line joins the 25 th and 7b lh percentiles of the samples. The 
dashed line extends the solid line to the extent of the sample. 
Figure 2 shows quantile-quantile plots for the analyses and 
observations from the assimilation presented here. 

7. Mean Error (ME), Bias, or Analysis-Observations 
(A-O), both globally and for each single location in the 
analyses. 

n 

ME = - V(y* - o t ) (20) 

n ' 

k - 1 

where Ok denotes the fcth observation (or psuedo observa- 
tion) and yk the corresponding value from the analyses. This 
is a useful measure of the bias between the observations and 
analyses. Figure 1 shows examples of bias vertical profiles 
for the January 1992 test case considered here. Typically 
the bias is an order of magnitude less than concentrations 
and within the analyses error. 

8. Global Histograms of (Observation - Analysis) 
Values The difference between the first guess and the final 
analyses. 

9. Mean Absolute Error (MAE), both globally and for 
each single location in the analyses. 

MAE = (21) 

k = l 

10. Mean Square Error (MSE), both globally and for 
each single location in the analyses. 

Tl 

MSE = - VCy* -o k ) 2 (22) 

n 

k = l 


11. Root Mean Square Error (RMSE), both globally 
and for each single location in the analyses. 

RMSE = VMSE (23) 

Figure 1 shows that, as would be expected, in many cases 
the bias is anticorrelated with the analysis increment. In 
other words, the assimilation process is trying to correct the 
bias that exists between the observations and model. This 
is just why the bias and analysis increments are most use- 
ful statistics in accessing the quality of both the model and 
observations. 

6.1. A Cautionary Note 

Data assimilation can easily cause a serious violation of 
conservation of mass if total mass, or total reactive nitro- 
gen (NO y = N 4- NO + N0 2 + 2 N 2 O 5 + HONO + HNO 3 + 
HO 2 NO 2 + CH 3 ONO 2 + CH 3 O 2 NO 2 4 CINO 2 + CIONO 
4 - CIONO 2 ), chlorine (C10 y = Cl -F 2C1 2 4- CIO 4- 2C1 2 0 2 4- 
CINO 2 + CIONO 4- CIONO 2 + HC1 4 HOC1 4- CH 3 OCl 
4- BrCl), bromine (BrO y = Br 4 2Br 2 4 BrO 3- BrONO 
4- Br0N0 2 4 HBr 4- HOBr 4 BrCl), or hydrogen (H y = 
2H 2 4 2H 2 0+ 4CH 4 ) are not included as control variables. 
To overcome this at the start of each timestep we note the 
NO y , ClOy, BrOy and H y , perform the assimilation, and 
then renormalise the NO y , C10 v , BrO y and H y . If this is 
not done totally unrealistic analyses can easily result. 

An example of mass conservation affecting the analyses 
can be seen in Figure 1 where there is a noticeable bias in the 
II 2 0 analyses. This is because the total II y is known quite 
accurately, and consequently the available observations of 
H 2 0 and CH 4 can not simultaneously be correct. This dif- 
ference between the observations and analyses is highlighted 
in the H 2 0 quantile-quantile plot shown in Figure 2, and in 
the vertical profiles of bias, O-F, and A-F shown in Figure 1. 
Therefore the analyses have conserved mass by slightly re- 
ducing the levels of H 2 0 and CH 4 , as can be seen in Figures 1 
and 2. The adjustment in CH 4 is less than that in H 2 0 as 
CH 4 has the lower observation uncertainty. 

A similar situation is found in the partitioning of reac- 
tive nitrogen. If we examine the vertical profiles of NO, 
N0 2 , N 2 Os, HNO 3 and C10N0 2 in Figure 1 we see that 
between 10 and 30 mb there is a bias in all these species. 
For N0 2 , N 2 O 5 , HNO 3 and CIONO 2 the analyses values 
are all less than the observations. This is because the to- 
tal NO y in this region is accurately known and the sum of 
the observed N0 2 , N 2 0 5 , HN0 3 and C10N0 2 observations 
would considerably exceed the known NO v . Consequently, 
the analyses has slightly reduced the concentrations of N0 2 , 
N 2 O 5 , HNO 3 and CIONO 2 to ensure NO y conservation. In 
the case of CIONO 2 it is very likely that there is an observa- 
tional bias as CIONO 2 is also a significant component of the 
ClOy family for which we also have HC1 observations from 
HALOE, and between 10 and 30 mb there is no significant 
bias in HC1. Therefore the assimilated CIONO 2 concentra- 
tions, which have a bias relative to the observed C10N0 2 
concentrations, are consistent with the observed and anal- 
ysed HC1 concentrations. 

7. A Case Study 

Let us now consider a case study from January 1992 
where the Kalman filter chemical data assimilation system 
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described above was used to analyse a vertical atmospheric 
profile. The vertical profile was at an equivalent PV latitude 
((j> e ) of 55°S and consisted of 21 potential temperature (0) 
levels spaced equally in log(0) between 400 K and 2000 K. 
This equivalent latitude was chosen as it was welt observed 
during January 1992 by the the Halogen Occultation Exper- 
iment (HALOE), the Microwave Limb Sounder (MLS), and 
the Cryogenic Limb Array Etalon Sounder (CLAES) aboard 
the Upper Atmosphere Research Satellite (UARS). In addi- 
tion, considering just one vertical profile allows a detailed 
examination of the diurnal cycle in species such as NO and 

no 2 . 

Figure 1 shows vertical profiles of the chemical analy- 
ses produced by data assimilation for O 3 , N 2 O, NO, NO 2 , 
N2O5, HNO 3 , CIONO 2 , HC1, H 2 0 and CH 4 overlaid with 
their pseudo observations. Shown on the same horizontal 
scale are vertical profiles of the representativeness error, the 
observation error and the analyses error. In a separate panel 
there are vertical profiles of the bias between the analyses 
and the pseudo observations, the analysis increment, (O-F), 
and (A-F). These quantities are all defined in Section 6 
above. 

Several points are noteworthy. As would be expected, the 
analyses error is normally always less than the combination 
of the observation and representativeness error (Figure I). 
The only exception to this is for some parts of the NO and 
N0 2 vertical profiles. In each case it occurs above 35 km 
where there is an inconsistency between the observed NO, 
N0 2i and O 3 and the theoretical knowledge encapsulated in 
the model. In this region the photochemical theory of NO z 
is well known and such an inconsistency did not occur when 
ATMOS data was used. It is therefore very likely that there 
is a bias in either, or both, the CLAES N0 2 observations 
or the HALOE NO observations. Due to the close chemi- 
cal coupling between O 3 and NO x this has also led to the 
largest ozone bias occurring above 40 km. The assimilation 
has combined all the information available and highlighted 
the inconsistency by the larger (O-F) and (A-F) values and 
by increasing the analyses uncertainty. 

The biases in N0 2 , N 2 O 5 , HNO 3 and CIONO 2 between 
10 and 30 mb to ensure NO y conservation have already been 
considered in Section 6.1 above. 

Figure 2 shows quantile-quantile plots of observations 
against analyses. A quantile-quantile plot is useful for de- 
termining whether two samples come from the same distri- 
bution (whether normally distributed or not). The quantile- 
quantile plot has three graphical elements. The pluses are 
the quantiles of each sample. The number of pluses is the 
number of data values in the smaller sample. The solid 
line joins the 25^ and 75 th percentiles of the samples. The 
dashed line extends the solid line to the extent of the sample. 

The quantile-quantile plots for O 3 , NO, N 2 0, HC1 and 
CH 4 all show a good straight line relationship. This means 
that the shape of the observation and analyses probability 
distribution functions (PDFs) are the same to a very good 
approximation. The quantile-quantile plots for CIONO 2 
shows a disagreement in the dotted line region, i.e. on the 
wings of the plot beyond the 75 th percentile, and the plots 
for N0 2 , HNO 3 and H 2 0 show some minor discrepancies 
in the solid line region, these relate to the conservation is- 
sues mentioned above. The quantile-quantile plots for N 2 O 5 
shows the biggest discrepancy, the cause of this can be seen 
in Figure 1 where there is a large bias between the analyses 
and observations, again related to the conservation issues 
mentioned above. 

The left hand column of plots in Figures 3 and 4 shows 
one diurnal cycle of the chemical analyses produced by data 


assimilation for a vertical profile at an equivalent PV lati- 
tude (<p e ) of 55 °S consisting of 21 potential temperature ( 0 ) 
levels spaced equally in log(0) between 400 K and 2000 K 
overlaid with the raw observations. The right hand column 
shows the corresponding analyses uncertainty overlaid with 
the observational uncertainty. As would be expected, the 
analyses uncertainty is less than the observational uncer- 
tainty as information propagates between variables and also 
comes from our apriori and theoretical description of the 
system. 

It is noteworthy to see how the time variation in the anal- 
yses uncertainty is very different from species to species. 
Some species have very little change in their uncertainty, 
whereas species such as NO and N0 2 have a strong diur- 
nal cycle in their uncertainty. Yet other species without a 
significant diurnal cycle, such as HC1, are affected by using 
observations of NO and N0 2 . This shows the propagation 
of information between species within data assimilation and 
can be seen clearly in Figures 3 and 4. For example, the 
uncertainty of NO and HC1 (right hand column of figures) 
are both affected by the observations of N0 2 . 

8. Summary 

This paper gives a detailed description of a Kalman filter 
chemical data assimilation system, and an example of its 
use from January 1992. The system is designed to aid in 
the analysis and quality control of atmospheric observations 
made by remote sensing and in-situ instruments. Quality 
control has been found to be an essential part of the assim- 
ilation. 

The assimilation has performed well and highlighted 
likely inconsistencies (biases) in the NO, N 0 2 , ^Os, HNO3 
and CIONO2 observations between 10 and 30 mb, in O3 and 
NOr above 40 km, and in H 2 0 and CH 4 throughout much 
of the stratosphere. Such inconsistencies were not encoun- 
tered when using high quality ATMOS data and thus show 
the value of chemical data assimilation as part of the valida- 
tion of remotely sensed chemical data. We hope to use this 
system in the validation of ENVISAT data. 
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Figure 1 . Vertical profiles of the chemical analyses produced by data assimilation for O3, N2O, NO, 
N 0 2) N2O5, HNOs, C!ON02> HCI, H2O find CH4 overlaid with their pseudo observations. Shown on 
the same horizontal scale are vertical profiles of the representativeness error, the observation error and 
the analyses error. In a separate panel there are vertical profiles of the bias between the analyses and 
the pseudo observations, the analysis increment, (O-F), and (A-F). 
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Figure 2. Quantile-Quantile Plots of observations against analyses for O3, N 2 0 , NO, NO2, N-iOs, 
HN03, CIONO2, HCI, H2O and CH4. A quantile-quantile plot is useful for determining whether two 
samples come from the same distribution (whether normally distributed or not). The quantile-quantile 
plot has three graphical elements. The pluses are the quantiles of each sample. The number of pluses is 
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